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where  $  represents  the  univariate  normal  distribution  function.  This  special  case 
is  the  basis  for  much  of  item-response  theory.  More  recently,  however,  Bohrer  and 
Schervish  (1981),  have  developed  an  error  bounded  algorithm  for  evaluating  Fn  for 
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point  operations  per  second. 
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ABSTRACT 


The  probability  integral  of  the  multivariate  normal  distribution  has  received 
considerable  attention  since  Sheppard  (1900)  and  Pearson  (1901)  published 
their  seminal  work  on  the  bivariate  normal  distribution. ^In  the  general  case, 
we  are  concerned  with  evaluating 


/h\  rht  rnn 
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where  {/>,_, }  represents  the  n  x  n  correlation  matrix  of  the  x,‘s.  and  f(i\.  x2 . xn\  {ptJ } ) 

is  the  standardized  multivariate  normal  density  function.  Direct  evaluation  of 
is  only  possible  for  special  cases  of  {/>,;}.  For  example.  Dunnett  and  Sobel 
t  l!)oo)  have  shown  that  when  ptJ  —  a,a,(i  ^  j).  where  |  a,  |<  1.  then 


F-xlhx.h; . hn:  {pl}})  =  J 
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where  $  represents  the  univariate  normal  distribution  function.  This  special 
case  is  the  basis  for  much  of  modern  psychometric  theory.  More  recently, 
however.  B<  lirer  and  Schervish  (I9$l).  have  developed  an  error-bounded  algo¬ 
rithm  for  evaluating  Fn  for  general  {p,j}.  Computationally,  this  algorithm  is 
restricted  to  n  =  7.  and  even  at  n  =  7.  it  can  require  as  much  as  24  hours  to 
compute  a  single  probability  with  10"  5  accuracy  on  a  computer  than  is  capable 
of  approximately  1-2  million  scalar  floating  point  operations  per  second: 
'T^y.The  put  pose  of  this  report  is  to  present  a  fast  and  general  approximation 
for  rectangular,  regions  of  the  multivariate  normal  distribution  function  based 
on  (.’lark's1  approximation  to  the  moments  of  the  maximum  of  n  jointly 

normal  random  variables.  The  performance  of  this  approximation  compared  to 
special  cases  in  which  the  exact  results  are  known  and  error-bounded  reduction 
formulae  show  the  accuracy  of  the  approximation  to  be  adequate  for  many, 
practical  applications  where  multivariate  normal  probabilities  are  required.  ( 


H  ) 


1  Introduction 


The  probability  integral  of  the  multivariate  normal  distribution  has  received 
considerable  attention  since  Sheppard  (1900)  and  Pearson  (1901)  published 
their  seminal  work  on  the  bivariate  normal  distribution.  In  the  general  case, 
we  are  concerned  with  evaluating 
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where  {p,,}  represents  the  n  x  n  symmetric  correlation  matrix  of  the  jq's.  and 
fu’i.X' . rn;{p,,})  is  the  standardized  multivariate  normal  density  func¬ 

tion.  Direct  evaluation  of  Fn  is  only  possible  for  special  cases  of  {pi;}.  For 
example.  Dunnett  and  Sobel  i  195>)  have  shown  that  when  pi;  =  a,a;(t  ^  j). 
where  '  o,  |<  1.  then 


f(y)d{y) 


where  0  represents  the  univariate  normal  distribution  function.  The  probabil 
ity  in  equation  (2)  can  be  approximated  to  any  practical  degree  of  accuracy 
using  (jauss- Hermite  quadrature  i  Stroud  and  Sechrest.  1966).  It  should  be 
noted  that  when  ptj  =  p  for  all  i.j.  then 


F^li.  h 


/i  )df  y  i 


and  if  p  -  and  h  -  0. 
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More  recently,  however,  Bohrer  and  Schervish  (1981).  have  developed  an 
error-bounded  algorithm  for  evaluating  Fn  for  general  { p;J } .  Computationally, 
this  algorithm  is  restricted  to  n  =  7.  and  even  at  n  —  7.  it  can  require  as  much 
as  24  hours  to  compute  a  single  probability  with  10~3  accuracy  on  a  computer 
than  is  capable  of  approximately  1-2  million  scalar  floating  point  operations 
per  second.  It  is  unclear  whether  vectorization  of  this  algorithm  is  possible,  so 
that  the  greatly  increased  speeds  of  parallel  computing  environments  could  be 
exploited  I  t-.g..  20-80  million  floating  point  instructions  per  second).  Even  still, 
it  is  unlikely  that  this  algorithm  would  be  computational  tractable  for  n  >  10. 


An  alternate  approach  to  approximating  Fn,  can  be  obtained  by  noting 
that. 


Fn  =  Pr(x i  <  h i.x2  <  hi - in  <  hn) 


(5) 


If  hx  ■  ■  ■  hn  =  h  =  0.  and  the  x,  follow  a  standardized  multivariate  normal 
distribution.  F°  is  a  so-called  "orthant”  probability.  Note,  however,  that  this 
orthant  probability  is  equivalent  to 


F'J  =  Fr  {rnail  X| . vn  )  <  0}  (6) 

If  mai(ii . i'n  i  were  normally  distributed,  which  it  clearly  is  not.  with  mean 

E  mai(i, . rn')j  ami  variance  l  [marli, . xn)].  then. 
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where  in  this  case  li  =  0.  For  the  more  general  rectangular  region  case  of  ht. 
we  could  set  h  =  0  and  subtract  /?,  from  the  mean  values  of  each  of  the  x,. 
which  to  this  point  have  been  expressed  in  standardized  form. 

To  use  this  algorithm,  we  must  Hrst  have  an  accurate  method  of  computing 

the  Hrst  two  moments  of  mam, . r„ j  where  the  x,  have  a  joint  multivariate 

normal  distribution  with  general  correlation  { } .  and  some  bound  on  the  error 
introduced  by  assuming  that  mail  r,. ...  xn)  has  a  normal  distribution.  Such 
an  approximation  has  been  described  by  (.’lark  (1961).  and  in  the  following. 

we  describe  its  use  m  connection  with  evaluating  /up  f\.i2 . x\:  {p,, }).  We 

begin  by  reviewing  Clark's  original  formulae. 


2  The  Clark  Algorithm 

Let  any  three  successive  components  from  an  n-variate  vector.  y,.  be  dis- 
t  ributed: 


’  <h 

’  -) 

j 
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Let  y,  =  m ax(y,)  =  y,.  and  compute  the  probability  that  y,+ 1  >  y,  as 
follows: 

set  =  in,  -//,+i  1/C-hi- 

where  Tc !  =  *,+i  /V.*i  ■ 


=  P(y,+ 1  -  y  >  0) 

=  *(--.+i) 


■) 


Then  P(y,  + 1  >  y  I 


the  value  of  the  univariate  normal  distribution  function  at  the  standard  deviate 

--i+i- 

Now  let  yt+i  =  max(y,.  y1  +  1 )  and  assume  (as  an  approximation)  that  (yt+2«  y>+i ) 
is  bivariate  normal  with  means. 

/t  (  U 1 4*  2  )  bVi  *  2  )  “  P  i  +  2  /  g  \ 

p(y,+\)  ~  ; '//,  +  !)  =  Ji.tyr.+i)  + /i1+l<&(-rt+1)  +  <,\+iO(-.+ih 

variances 

'72(y,+2)  = 

<72(y.  +  i)  = 

where 

i  ( y,'+i )  =  ( df  ■+■  'T,‘  r<  +  i  1  "*■  ( /'ht-i  "*■  *,*+!  —  -i  +  i )  +  U*i  +  /d+i  )C.+iO(  -i+i  )• 

(10) 

and  correlation 

,  -  .  +  -i  +  l  )+  ^i  +  lPi  +  l.  >  +  2$(  — -i  +  l ) 
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Then. 

P  (  '/,  +  .>-  maxi  »/...»)  I  =  /?  1 1  (A f  2  y i  + 1  >  0)  n  (yl+2  -  y,  >  0)  I 

is  approximated  ;>y 
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Assuming  as  a  working  approximation  that  y,  +  t  is  normally  distributed 
with  the  above  mean  and  variance,  we  may  therefore  proceed,  recursively  from 
i  =  1  to  i  =  n  —  l.  where  ;/„*!  is  an  independent  dummy  variate  with  mean 
zero  and  variance  zero  (i.e.  yn+1  =  0).  Then,  for  example. 


Ti y;,.,)  -  £"iy.-2 
+  i.y,A, )  -  5 *'(//,+ 1 


1 


P(^n+t  =  max(i/1,y2 . ih+i)] 

=  p  [(yn+1  -yi  >  0)  n  (yn+1  -y2  >  0)  n . . .  n  {yn+l  -  yn  >  0)] 

=  P  {(—y  i  >  0)  n  (  —y2  >o)n...n(— yn  >0)]  (14) 

approximates  the  probability  of  the  negative  orthant  of  the  specified  multi¬ 
variate  normal  distribution.  In  the  case  of  n  correlated  standard  normals,  the 
negative  and  positive  orthant  probabilities  are  identical.  The  probability  of  any 
other  orthant  can  be  obtained  by  reversing  the  signs  of  the  variates  correspond¬ 
ing  to  l's  in  the  orthant  pattern.  Of  course.  y,+i  is  not  normally  distributed. 
Errors  produced  by  substituting  normal  approximations  for  the  moments  of 
t/,. t  are  discussed  in  the  following  section. 

More  generally,  to  compute  a  multivariate  normal  probability  over  an  n 
dimensional  rectangular  region,  for  example. 

J  J  ■  ■  ■  J  /( 'f  i .  -r i . .iv.  {p,j  })dx ,  .  . .  din  (15) 

we  compute  the  negative  orthant  setting  =  h.  Finally,  to  approximate  the 
integral  tor  general  /?,.  we  compute  the  negative  orthant  by  setting  ^n+i  =  0 
and  /(,  =  /(,  -  /?,. 

3  Accuracy  of  the  Clark  Approximation 

The  errors  of  the  Clark  approximation  result  from  the  replacement  of  non¬ 
normal  distributions  by  normal  approximations.  For  example,  suppose  that 
we  are  interested  in  the  maximum  of  four  standard  normal  variables,  i.t.. 
m<t.r(t/|.  y2. J/3- .1/4  )•  By  assuming  that  y2  is  normally  distributed  with  expected 
value  E[max(y\.  y2)}  and  variance  y2)}.  we  can  then  use  the  mo¬ 

ments  of  nuu'd'/i.  y-i)  as  an  approximation  for  those  of  max{yl.  y2.  y2).  Next, 
we  assume  that  f/i  is  normally  distributed  with  expectation  and  variance  equal 
to  the  corresponding  moments  of  mai{y2.  y2).  and  can  therefore  use  the  mo¬ 
ments  of  max{y2,  yA )  as  an  approximation  for  those  of  max(  tq . ).  In  this 

example,  of  course.  y2  and  ;/3  are  not  normally  distributed.  Furthermore,  this 
is  a  rare  case  in  which  the  distribution  of  a  statistical  variate  diverges  from 
normality  as  sample  size  increases.  Tippet  (1925)  first  showed  that  skewness 
and  kurtosis  of  the  maximum  of  n  standard  normals  goes  form  .019  and  .62 
respectively  for  n  =  2  to  .429  and  .765  for  n  =  100  to  .618  and  1.088  for  n  = 
1000.  In  terms  of  expected  values  of  n  standard  normal  variables,  the  effect  of 
this  non-normality  is  quite  small.  For  example,  for  n  =  10.  the  true  value  is 


1.5388  and  the  approximation  yields  1.5367.  Even  for  n  =  1000  the  expected 
value  is  3.2414  and  the  approximated  value  is  3.2457. 

The  effect  of  non-normality  on  the  accuracy  of  the  approximation  is  also 
dependent  on  the  difference  between  E{yt-i,yt).  For  example,  suppose  we 
wish  to  approximate  the  moments  of  max(yl.  y2)  where  y\  and  y2  are  not 
normally  distributed.  Clark  (1961)  points  out  that  if  the  difference  E{y\)  — 
Eiiji)  is  large  relative  to  the  greater  of  l  ’C2(y,)  and  \'lF(y2)  the  random 
variable  max(yi.y2)  is  almost  identical  to  y{.  Certainly  the  first  two  moments 
of  max(yl.  y2)  would  be  minimally  affected  bv  replacing  y  1  and  y2  by  normal 
approximations.  However  if  E(yx)  —  E(y2)  is  small  relative  to  the  respective 
standard  deviations,  then  the  use  of  normal  approximations  could  conceivably 
result  in  significant  errors  in  the  approximation  of  the  mean  and  variance  of 
their  maximum. 

In  light  of  this,  the  following  illustrations  of  the  accuracy  of  the  Clark 
approximation  are.  in  fact,  the  worst  case  results,  since  they  represent  the  case 
in  which  the  expected  values  of  the  //,  are  equal.  These  results  indicate  that  the 
error  bound  for  the  Clark  approximation  is  approximately  10"C  as  illustrated 
in  the  following  section. 


4  Illustration 

To  evaluate  the  performance  of  this  algorithm,  we  have  examined  a  series  of 
equa-correiated  multivariate  normal  distributions  for  which  exact  results  are 
known  1  see  Gupta.  1963)  and  those  considered  by  Schervish  (1984).  Table  1A 
displays  results  tor  3  to  7  equa-correlated  standard  normal  random  variables 
with  selected  values  of  p  =  .2.  .3.  .8  and  .9.  and  upper  integration  bounds 
of  0.  1.  and  2.  Inspection  of  the  tabled  probabilities  reveals  that  the  Clark 
algorithm  is  generally  accurate  to  at  least  10“ !  and  that  computational  times 
are  a  linear  function  of  dimensionality.  The  speed  of  the  Clark  algorithm  does 
not  depend  on  p.  In  contrast,  the  speed  of  MCLNOR  (Schevish.  1984)  is  expo¬ 
nentially  increasing  with  both  dimensionality  and  p.  In  the  7-variate  normal 
case  with  ptJ  =  .9.  Ml  LNOR  required  almost  a  day  to  compute  a  probability 
which  was  accurate  to  2  x  10~\  whereas  the  Clark  algorithm  computed  the 
same  probability  with  4  x  10~4  accuracy  in  less  than  three  thousandths  of  a 
second.  Inspection  of  these  results  and  others  not  reported  here,  suggest  that 
the  accuracy  of  the  Clark  approximation  increases  with  increasing  p. 

Table  IB  displays  results  for  orthant  probabilities  of  higher  dimensional 
integrals  (n  =  10.  20.  and  40).  for  the  special  case  of  p  =  .5.  where  F ®  = 
l/In  -t-  l).  Again,  results  are  accurate  to  at  least  10~3.  and  computational 
times  are  linear  in  n.  MCLNOR  could  not  be  used  to  evaluate  integrals  of  this 
dimensionality. 
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Finally,  Table  1C  displays  results  for  some  tail  probabilities  of  the  multi¬ 
variate  normal  distribution.  In  this  case,  the  upper  bound  of  the  integration 
was  -2.5,  n  =  (3,5,10),  and  p  =  (.5, .9).  These  probabilities  ranged  from  10~3 
to  10-5  and  accuracy  of  the  Clark  approximation  was  10~5  in  all  cases. 

5  Discussion 

Clark's  (1961)  formulae  for  the  moments  of  the  maximum  of  n  correlated  ran¬ 
dom  normal  variates  can  clearly  be  used  to  obtain  a  fast  and  accurate  ap¬ 
proximation  to  multivariate  normal  probabilities.  Examination  of  a  series  of 
examples  involving  special  cases  in  which  the  true  results  are  known,  reveals 
that  the  error  bound  for  the  approximation  is  approximately  10-3  regardless 
of  dimensionality,  and  that  accuracy  increases  with  increases  in  |  p  j.  These 
results  are  conservative  in  that  we  would  expect  the  ill  effect  of  using  normal 
approximations  to  be  greatest  when  pt  =  p.(i  =  l.n)  which  is  the  case  used 
in  the  illustrations. 

In  terms  of  computational  speed,  the  Clark  approximation  is  clearly  un¬ 
paralleled.  A  reasonable  estimate  of  the  speed  of  the  Clark  algorithm  is  given 
by. 


,  /.0004(n)\ 

speed  =  - - —  seconds 

\ meganop  J 

where  megafiop  is  the  number  of  scalar  floating  point  instructions  per  second 
that  the  computer  is  capable  of  performing. 

Numerous  applications  of  the  Clark  algorithm  suggest  themselves.  Some 
preliminary  work  in  this  area  has  already  been  conducted  by  Daganzo  (1984). 
in  the  context  of  discrete  choice  models  of  consumer  behavior,  and  by  Gibbons. 
Bock  and  Hedeker  ( 1987)  in  item-response  theory.  Other  potential  applications 
include  multivariate  generalizations  of  probit  analysis  (see  Ashford  and  Sow- 
den.  1970  for  the  bivariate  case),  and  random-effect  probit  models  (Gibbons 
and  Bock.  1987).  where  the  Clark  approximation  was  used  to  estimate  first- 
order  autocorrelation  among  the  residual  errors. 

Another  area  of  potential  interest  is  in  the  approximation  of  multivariate 
t  probabilities,  which  can  be  considered  as  the  joint  distribution  of  n  variates 

t,  =  :,/s.(i  =  1.2 . n)  where  the  c,  have  a  multivariate  normal  distribution 

with  zero  means  and  unknown  variance  a1,  and  known  correlation  matrix  {/v}. 
while  us‘ /<r‘  has  a  \ :  distribution  with  u  degrees  of  freedom  and  is  independent 
of  the  Dunnett  (1955)  has  evaluated  this  joint  density  for  the  case  of 

Pi)  =  P  =  5.  by  obtaining  . and  integrating  out  s.  Use 

of  the  Clark  algorithm  would  provide  a  generalization  of  their  result  to  the 


J 


case  of  general  { pt] },  a  natural  application  of  which  would  be  a  generalization 

of  Dunnett’s  test  to  the  case  of  unequal  sample  sizes  among  the  k  +  1  groups 

(i.e.,  treatment  groups  and  a  single  control). 
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Table  1 


Probability  that  n  Standard  Normal  Random  Variables  with 
Common  Correlation  p .  are  Simultaneously  <  h. 


A.  Comparison  with  MCLN’OR 


n 

h 

p 

True1 

MCLN'OR 

Clark 

Prob 

Time2 

Time3 

Prob 

Time3 

\ 

2.0 

.0 

.95170 

.95170 

.195 

.060 

.96185 

.0008 

4 

2.0 

.5 

.9284) 

.925'  15 

7.275 

1.760 

.94088 

.0012 

4 

2.0 

.8 

.94759 

.94758 

14.745 

2.914 

.94819 

.0012 

1 

2.0 

.9 

.95708 

.“5707 

l  •4.557 

4.855 

.95740 

.0012 

5 4 

1.0 

:■] 

.52111 

.52114 

10.151 

8.900 

.51441 

.0016 

-  \ 

i 

0.0 

.9 

.42957 

.42955 

NA 

98040 

.42921 

.0026 

-  \ 

< 

0.0 

2 

.04014 

.OIIKC 

NA 

744 

.04122 

.0026 

1  Gupta  ( I 964  I 
■’  Seconds  on  a  DEC  2060 

!  Seconds  on  a  COMPAQ  Weitek  -5167.  SVS  FORTRAN 

4  Accuracy  set  to  10“  1  instead  of  10” '  for  Ml'LNOR 


B.  Higher  Dimensional  Integrals 


Clark 

n 

h 

P 

Irue1 

P  rob 

Time 

10 

0 

.  *5 

.09001 

.08907 

.0041 

20 

0 

.*) 

01762 

.01657 

.0144 

10 

0 

.5 

.02149 

.02490 

.0459 

1  Fn i  0.0 . 0 :{.")}  i  =  ^ 


C.  Tail  Probabilities 


n  h 

/)  True1 

(.'lark 

4  -2.5 

.5  .00017 

.00021 

4  -2.5 

0  .1)0240 

.00241 

5  -2.5 

.5  .00004 

.00004 

4  -2.5 

0  .00157 

.00156 

10  -2.5 

.4  .00000 

.00000 

10  -2.5 

.9  .00099 

.00098 

1  Gupta 
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